Closed Bug 1506910 Opened 7 years ago Closed 5 years ago

Data race between mozPoisonValue and mozPoisonValueInit

Categories

(Core :: MFBT, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
84 Branch
Tracking Status
firefox84 --- fixed

People

(Reporter: ytausky, Assigned: Gankra, NeedInfo)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

Found while running TSan: 0:16.46 pid:7871 ================== 0:16.46 pid:7871 WARNING: ThreadSanitizer: data race (pid=7960) 0:16.46 pid:7871 Write of size 8 at 0x55ffcff04308 by main thread: 0:16.46 pid:7871 #0 mozPoisonValueInit /home/ytausky/dev/mozilla-central/mfbt/Poison.cpp:211:23 (firefox+0xec8e4) 0:16.46 pid:7871 #1 NS_InitXPCOM2 /home/ytausky/dev/mozilla-central/xpcom/build/XPCOMInit.cpp:492:3 (libxul.so+0x14dd711) 0:16.46 pid:7871 #2 XRE_InitEmbedding2(nsIFile*, nsIFile*, nsIDirectoryServiceProvider*) /home/ytausky/dev/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:195:8 (libxul.so+0x73498b3) 0:16.46 pid:7871 #3 mozilla::ipc::ScopedXREEmbed::Start() /home/ytausky/dev/mozilla-central/ipc/glue/ScopedXREEmbed.cpp (libxul.so+0x1dc7fb1) 0:16.46 pid:7871 #4 mozilla::dom::ContentProcess::Init(int, char**) /home/ytausky/dev/mozilla-central/dom/ipc/ContentProcess.cpp:297:13 (libxul.so+0x4da041e) 0:16.46 pid:7871 #5 XRE_InitChildProcess(int, char**, XREChildData const*) /home/ytausky/dev/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:754:21 (libxul.so+0x734a106) 0:16.46 pid:7871 #6 mozilla::BootstrapImpl::XRE_InitChildProcess(int, char**, XREChildData const*) /home/ytausky/dev/mozilla-central/toolkit/xre/Bootstrap.cpp:69:12 (libxul.so+0x7355a07) 0:16.46 pid:7871 #7 content_process_main(mozilla::Bootstrap*, int, char**) /home/ytausky/dev/mozilla-central/browser/app/../../ipc/contentproc/plugin-container.cpp:50:30 (firefox+0xc3d71) 0:16.46 pid:7871 #8 main /home/ytausky/dev/mozilla-central/browser/app/nsBrowserApp.cpp:287 (firefox+0xc3d71) 0:16.46 pid:7871 Previous read of size 8 at 0x55ffcff04308 by thread T2: 0:16.46 pid:7871 #0 mozPoisonValue /home/ytausky/dev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Poison.h:30:10 (libxul.so+0x1d3d24b) 0:16.46 pid:7871 #1 mozWritePoison /home/ytausky/dev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Poison.h:41 (libxul.so+0x1d3d24b) 0:16.46 pid:7871 #2 mozilla::detail::OutOfLinePoisoner<112ul>::poison(void*, unsigned long) /home/ytausky/dev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Maybe.h:76 (libxul.so+0x1d3d24b) 0:16.47 pid:7871 #3 void mozilla::detail::PoisonObject<IPC::Message>(IPC::Message*) /home/ytausky/dev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Maybe.h:85 (libxul.so+0x1d3d24b) 0:16.47 pid:7871 #4 mozilla::detail::MaybePoisoner<IPC::Message>::poison(void*) /home/ytausky/dev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Maybe.h:99 (libxul.so+0x1d3d24b) 0:16.47 pid:7871 #5 mozilla::Maybe<IPC::Message>::poisonData() /home/ytausky/dev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Maybe.h:175 (libxul.so+0x1d3d24b) 0:16.47 pid:7871 #6 mozilla::Maybe<IPC::Message>::reset() /home/ytausky/dev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Maybe.h:491 (libxul.so+0x1d3d24b) 0:16.47 pid:7871 #7 IPC::Channel::ChannelImpl::ProcessIncomingMessages() /home/ytausky/dev/mozilla-central/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:589 (libxul.so+0x1d3d24b) 0:16.47 pid:7871 #8 IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int) /home/ytausky/dev/mozilla-central/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:830:10 (libxul.so+0x1d3de81) 0:16.47 pid:7871 #9 base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) /home/ytausky/dev/mozilla-central/ipc/chromium/src/base/message_pump_libevent.cc:255:14 (libxul.so+0x1d175ef) 0:16.47 pid:7871 #10 event_persist_closure /home/ytausky/dev/mozilla-central/ipc/chromium/src/third_party/libevent/event.c:1580:9 (libxul.so+0x1d84163) 0:16.47 pid:7871 #11 event_process_active_single_queue /home/ytausky/dev/mozilla-central/ipc/chromium/src/third_party/libevent/event.c:1639 (libxul.so+0x1d84163) 0:16.47 pid:7871 #12 event_process_active /home/ytausky/dev/mozilla-central/ipc/chromium/src/third_party/libevent/event.c (libxul.so+0x1d6e038) 0:16.47 pid:7871 #13 event_base_loop /home/ytausky/dev/mozilla-central/ipc/chromium/src/third_party/libevent/event.c:1961 (libxul.so+0x1d6e038) 0:16.47 pid:7871 #14 base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) /home/ytausky/dev/mozilla-central/ipc/chromium/src/base/message_pump_libevent.cc (libxul.so+0x1d1792e) 0:16.47 pid:7871 #15 MessageLoop::RunInternal() /home/ytausky/dev/mozilla-central/ipc/chromium/src/base/message_loop.cc:325:10 (libxul.so+0x1d1459f) 0:16.47 pid:7871 #16 MessageLoop::RunHandler() /home/ytausky/dev/mozilla-central/ipc/chromium/src/base/message_loop.cc:318 (libxul.so+0x1d1459f) 0:16.47 pid:7871 #17 MessageLoop::Run() /home/ytausky/dev/mozilla-central/ipc/chromium/src/base/message_loop.cc:298 (libxul.so+0x1d1459f) 0:16.47 pid:7871 #18 base::Thread::ThreadMain() /home/ytausky/dev/mozilla-central/ipc/chromium/src/base/thread.cc:198:16 (libxul.so+0x1d2b072) 0:16.47 pid:7871 #19 ThreadFunc(void*) /home/ytausky/dev/mozilla-central/ipc/chromium/src/base/platform_thread_posix.cc:40:13 (libxul.so+0x1d1d112) 0:16.47 pid:7871 Location is global 'gMozillaPoisonValue' of size 8 at 0x55ffcff04308 (firefox+0x000000f9b308) 0:16.47 pid:7871 Thread T2 'Chrome_~dThread' (tid=7963, running) created by main thread at: 0:16.47 pid:7871 #0 pthread_create <null> (firefox+0x2f296) 0:16.47 pid:7871 #1 (anonymous namespace)::CreateThread(unsigned long, bool, PlatformThread::Delegate*, unsigned long*) /home/ytausky/dev/mozilla-central/ipc/chromium/src/base/platform_thread_posix.cc:127:14 (libxul.so+0x1d1c2ae) 0:16.47 pid:7871 #2 PlatformThread::Create(unsigned long, PlatformThread::Delegate*, unsigned long*) /home/ytausky/dev/mozilla-central/ipc/chromium/src/base/platform_thread_posix.cc:138 (libxul.so+0x1d1c2ae) 0:16.47 pid:7871 #3 base::Thread::StartWithOptions(base::Thread::Options const&) /home/ytausky/dev/mozilla-central/ipc/chromium/src/base/thread.cc:102:8 (libxul.so+0x1d2abfc) 0:16.47 pid:7871 #4 ChildThread::Run() /home/ytausky/dev/mozilla-central/ipc/chromium/src/chrome/common/child_thread.cc:27:12 (libxul.so+0x1d2cee2) 0:16.47 pid:7871 #5 ChildProcess::ChildProcess(ChildThread*) /home/ytausky/dev/mozilla-central/ipc/chromium/src/chrome/common/child_process.cc:20 (libxul.so+0x1d2cee2) 0:16.47 pid:7871 #6 mozilla::ipc::ProcessChild::ProcessChild(int) /home/ytausky/dev/mozilla-central/ipc/glue/ProcessChild.cpp:24:5 (libxul.so+0x1dc2e01) 0:16.47 pid:7871 #7 mozilla::dom::ContentProcess::ContentProcess(int) /home/ytausky/dev/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/mozilla/dom/ContentProcess.h:31:7 (libxul.so+0x734a05c) 0:16.47 pid:7871 #8 XRE_InitChildProcess(int, char**, XREChildData const*) /home/ytausky/dev/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:722 (libxul.so+0x734a05c) 0:16.47 pid:7871 #9 mozilla::BootstrapImpl::XRE_InitChildProcess(int, char**, XREChildData const*) /home/ytausky/dev/mozilla-central/toolkit/xre/Bootstrap.cpp:69:12 (libxul.so+0x7355a07) 0:16.47 pid:7871 #10 content_process_main(mozilla::Bootstrap*, int, char**) /home/ytausky/dev/mozilla-central/browser/app/../../ipc/contentproc/plugin-container.cpp:50:30 (firefox+0xc3d71) 0:16.47 pid:7871 #11 main /home/ytausky/dev/mozilla-central/browser/app/nsBrowserApp.cpp:287 (firefox+0xc3d71) 0:16.47 pid:7871 SUMMARY: ThreadSanitizer: data race /home/ytausky/dev/mozilla-central/mfbt/Poison.cpp:211:23 in mozPoisonValueInit 0:16.47 pid:7871 ==================
Blocks: tsan

It's been nearly a year, it might be worth checking whether this still reproduces. This race looks pretty troubling if it still happens. With the layout poison value having been repurposed into a more widely used thing, maybe its assumptions about XPCOM initialization no longer hold.

(In reply to :dmajor from comment #1)

It's been nearly a year, it might be worth checking whether this still reproduces. This race looks pretty troubling if it still happens. With the layout poison value having been repurposed into a more widely used thing, maybe its assumptions about XPCOM initialization no longer hold.

This still reproduces on startup, I just hit it on my machine.

Flags: needinfo?(dmajor)

Waldo: this could be from the Maybe-poisoning in bug 1414901. The poison value, having its roots in the layout arena code, is only available after XPCOM init, but we use Maybes more universally than that.

Flags: needinfo?(dmajor) → needinfo?(jwalden)
Attached file tsan-backtrace.txt

Turned this suppression off in a try run to see if it's still a problem. It is. Modern backtrace attached.

Taking this as I think I understand the issue and it has me a bit nervous that it's happening.

Assignee: nobody → a.beingessner

Poison was setup at the start of xpcom init when that was assumed to be early enough.
Since then, Poison was added to Maybe, and Maybe has been used everywhere, including in
our channel implementation. As a result, poison was being used before it was initialized.

This basically meant our poison pointers were being replaced with null instead, which
dances into some more UB than accessing a page we have actually allocated. Also, tsan
noticed that accesses to the value were racing with the initializer actually being
called!

A (dynamic) static initializer forces the poison initialization as we can reasonably
hope without getting CallOnce or singleton patterns involved.

Other changes:

  • Cleaned up the outdated documentation for mozWritePoison (the alignment
    restriction was removed in Bug 1414901)
  • Removed the poison supression from TSan
Attachment #9182839 - Attachment description: Bug 1506910 - Initialize the poison page with a dynamic static initializer. r?glandium → Bug 1506910 - Initialize the poison page with a static initializer. r?glandium
Pushed by abeingessner@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ec86e24123dd Initialize the poison page with a static initializer. r=glandium,decoder
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → 84 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: